ANOVA, work on the same general principle. They compare the size of the effect seen in your

sample against the size of the random fluctuations present in your sample. We describe individual

statistical significance tests in detail throughout this book. Here, we describe the generic steps

that underlie all the common statistical tests of significance.

1. Reduce your raw sample data down into a single number called a test statistic.

Each test statistic has its own formula, but in general, the test statistic represents the magnitude of

the effect you’re looking for relative to the magnitude of the random noise in your data. For

example, the test statistic for the unpaired Student t test for comparing means between two groups

is calculated as a fraction:

The numerator is a measure of the effect, which is the mean difference between the two groups.

And the denominator is a measure of the random noise in your sample, which is represented by the

spread of values within each group. Thinking about this fraction philosophically, you will notice

that the larger the observed effect is (numerator) relative to the amount of random noise in your

data (denominator), the larger the Student t statistic will be.

2. Determine how likely (or unlikely) it is for random fluctuations to produce a test statistic as

large as the one you actually got from your data.

To do this, you use complicated formulas to generate the test statistic. Once the test statistic is

calculated, it is placed on a probability distribution. The distribution describes how much the test

statistic bounces around if only random fluctuations are present (that is, if

is true). For example,

the Student T statistic is placed on the Student T distribution. The result from placing the test

statistic on a distribution is known as the p value, which is described in the next section.

Understanding the meaning of “p value” as the result of a test

The end result of a statistical significance test is a p value, which represents the probability that

random fluctuations alone could have generated results. If that probability is medium to high, the

interpretation is that the null hypothesis, or

, is correct. If that probability is very low, then the

interpretation is that we reject the null hypothesis, and accept the alternate hypothesis (

) as correct.

If you find yourself rejecting the null, you can say that the effect seen in your data is statistically

significant.

How small should a p value be before we reject the null hypothesis? The technical answer is

this is arbitrary and depends on how much of a risk you’re willing to take of being fooled by

random fluctuations (that is, of making a Type I error). But in practice, the value of 0.05 has

become accepted as a reasonable criterion for declaring significance, meaning we fail to reject

the null for p values of 0.05 or greater. If you adopt the criterion that p must be less than 0.05 to

reject the null hypothesis and declare your effect statistically significant, this is known as setting

alpha (α) to 0.05, and will establish your likelihood of making a Type I error to no more than 5

percent.

Examining Type I and Type II errors